Sample size determination for classifiers based on single-nucleotide polymorphisms.

نویسندگان

  • Xinyu Liu
  • Yupeng Wang
  • Romdhane Rekaya
  • T N Sriram
چکیده

Single-nucleotide polymorphisms (SNPs), believed to determine human differences, are widely used to predict risk of diseases. Typically, clinical samples are limited and/or the sampling cost is high. Thus, it is essential to determine an adequate sample size needed to build a classifier based on SNPs. Such a classifier would facilitate correct classifications, while keeping the sample size to a minimum, thereby making the studies cost-effective. For coded SNP data from 2 classes, an optimal classifier and an approximation to its probability of correct classification (PCC) are derived. A linear classifier is constructed and an approximation to its PCC is also derived. These approximations are validated through a variety of Monte Carlo simulations. A sample size determination algorithm based on the criterion, which ensures that the difference between the 2 approximate PCCs is below a threshold, is given and its effectiveness is illustrated via simulations. For the HapMap data on Chinese and Japanese populations, a linear classifier is built using 51 independent SNPs, and the required total sample sizes are determined using our algorithm, as the threshold varies. For example, when the threshold value is 0.05, our algorithm determines a total sample size of 166 (83 for Chinese and 83 for Japanese) that satisfies the criterion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single Nucleotide Polymorphism in the Dopamine Receptor Type 3 (DOP3) Candidate Gene Associated with Varroa Destructor Resistance in Honeybee

Extended Abstract Introduction and Objective: Varroa infestation is undoubtedly the greatest threat and challenge facing Apiculture today. This external parasite inevitably lives in the bee colony and causes irreparable damage to its colony and the subsequent honey production. One of the proposed strategies in this regard is the use of pesticides, which have a negative impact on the health of ...

متن کامل

Association of two single nucleotide polymorphisms rs10407022 and rs3741664 with the risk of primary ovarian insufficiency in a sample of Iraqi women

Primary ovarian insufficiency (POI) can be a devastating disease impacting women below the age of forty. This involves a major decrease in the amount and quality of oocytes, or ovarian reserve in a woman. The distribution of single-nucleotide polymorphisms, rs10407022 and rs3741664, in Iraqi people and its association with primary ovarian insufficiency is the main objective of this study. The m...

متن کامل

Novel Single Nucleotide Polymorphisms (SNPs) in Two Oogenesis Specific Genes (BMP15, GDF9) and Their Association with Litter Size in Markhoz Goat (Iranian Angora)

BMP15 and GDF9 are two oogenesis specific genes play a pivotal role in female fertility in mammals and potential for improvement of prolificacy in marker-assisted selection. The aim of present research was to investigate the variation and association between BMP15 and GDF9 polymorphism and litter size in Markhoz goats. The sequence variability of the different amplified fragments utilized for g...

متن کامل

Single Nucleotide Polymorphisms (SNPs) of GDF9 Gene in Bahmaei and Lak Ghashghaei Sheep Breeds and Its Association with Litter Size

Growth differentiation factor 9 (GDF9) belong to the superfamily of transforming growth factor β that is highly expressed in growing ovarian follicles of oocyte, and it has been strongly related to fecundity traits in sheep. Therefore, the GDF9 gene could serve as a genetic marker for improvement of reproductive performance in sheep. Therefore, the aim of this study was to invest...

متن کامل

The Single Nucleotide Polymorphisms in the C-reactive Protein Gene: are they Biomarkers of Cardiovascular Risk?

Recent pre-clinical and clinical studies have revealed the C-reactive protein gene (CRP) is related to the degree of acute rise in plasma C-reactive protein (CRP) levels. Moreover, single nucleotide polymorphisms (SNPs) in the CRP gene could associate with increased risk of cancer, atherosclerosis, diabetes mellitus, bowel disease, rheumatoid arthritis, psoriasis, obstructive pulmonary disease,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biostatistics

دوره 13 2  شماره 

صفحات  -

تاریخ انتشار 2012